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ABSTRACT 


Two  methods  are  presented  for  building  Interval  estimates  on  the  mean  of 
a  stationary  stochastic  process.  Both  methods  fit  an  autoregressive  moving- 
average  (ARMA)  model  to  observations  on  the  process.  The  model  Is  used  to 
estimate  the  variance  of  the  sample  mean  and  the  applicable  degrees  of  free¬ 
dom  of  the  t  statistic.  Fitting  of  the  ARMA  model  Is  totally  automated.  Ihe 
ARMA-based  confidence  Intervals  perform  well  with  data  generated  from  ARMA 
processes.  With  data  generated  from  queuing-system  simulations,  the  coverage 
of  the  confidence  Intervals  Is  less  than  satisfactory.  It  Is  shown  that  with 
queuing-system  data,  the  sample  mean  and  Its  estimated  standard  deviation  are 
strongly  positively  correlated,  and  that  the  residuals  of  the  fitted  models 
are  not  normally  distributed.  These  factors  contribute  adversely  to  the 
coverage  of  the  confidence-interval  procedures  with  queuing  data. 


L':  ^  j  Aj  [^1 

^  [;i 

-  on _ _ 


-1- 


Ue  Introduce  and  test  two  confidence  Interval  procedures  (ClVa)  for  the 
mean  of  a  univariate  output  random  variable  from  a  slnulation  model  operating 
at  steady  state*  The  CII^  are  based  on  an  autoregressive  moving-average 
(ARMA)  model  and  are  f Ixed-sample-slze  procedures.  The  threefold  purpose  of 
this  research  has  been: 

1.  to  develop  two  versions  of  an  ARMA-based 
confidence-interval  procedure; 

2.  to  measure  the  effectiveness  of  both  versions 

by  subjecting  them  to  comprehensive  testing;  and 

3.  to  develop  and  report  guidelines  for  using 
this  procedure  with  output  data  from  slnulation 
models . 

As  used  In  this  report,  a  confidence-interval  procedure  consists  of  four 
steps: 

a.  coiqputatlon  of  the  sample  mean; 

b.  estimation  of  the  variance  of  the  sample  mean; 

c.  determination  of  the  number  of  degrees  of  freedom;  and 

d.  confutation  of  an  Interval  estimate  for  the  process  mean, 
using  the  t  distribution  with  the  aforementioned 
quantities* 

These  four  steps  correspond  to  the  first  four  steps  given  In  Fishman  [1978,  p 
236]  for  forming  Interval  estimates.  After  reviewing  the  pertinent  funda¬ 
mentals  of  ARMA  processes  In  Section  1,  the  steps  making  up  the  ARMA-based 
confidence-interval  procedure  are  explained  In  detail  In  Section  2* 

The  proposed  Gift  have  been  subjected  to  comprehensive  empirical  test'^» 
using  the  research  framework  suggested  for  this  purpose  In  Schrlber  and 
Andrews  [1981].  Bnpirlcal  testing  of  a  GIF  involves  the  generation  of  data 
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from  a  series  of  theoretical  output  processes  (TOPS)  with  known  means.  In 
Section  3,  we  discuss  the  eight  TOPs  used  to  evaluate  the  CIPs  proposed 
here,  and  Indicate  the  reasons  these  TOPs  were  chosen  for  testing  purposes. 

After  a  brief  discussion  of  the  testing  environment  In  Section  4,  the 
empirical  results  of  the  testing  are  given  In  Section  5.  For  each  TOP  used, 
the  resulting  measures  of  effectiveness  (MOEs)  are  presented  In  a  corre¬ 
sponding  table  of  the  form  Introduced  In  Schrlber  and  Andrews  [1981].  1he 
tables  display  the  performance  characteristics  of  the  CIP  when  used  to  process 
data  generated  by  the  associated  TOPs. 

Both  CIPs  perform  well  with  data  generated  by  ARMA  TOPs,  which  In  this 
research  are  tailor-made  (Schrlber  and  Andrews  [1981]).  However,  when  used  to 
process  observations  produced  by  models  of  two  queuing  systems,  the  ARMA-based 
CIPs  did  not  perform  satisfactorily  for  all  the  measures  of  effectiveness. 
Nevertheless,  they  did  as  well  as  or  better  than  the  pure  autoregressive  (AR) 
confidence-interval  procedure  presented  by  Fishman  [1971]  and  further 
Investigated  by  Andrews  and  Schrlber  [1978]. 

In  Section  5,  we  also  Investigate  possible  reasons  for  the  failure  of 
these  CIPs  to  perform  In  better  fashion  on  data  generated  by  the  queuing- 
system  models.  Ihls  Investigation  takes  the  form  of  empirically  determining 
the  extent  to  which  the  underlying  assumptions  were  satisfied  by  the  queuing 
system  data.  Ihls  discussion  concludes  with  the  recommendation  that  the 
ARMA-based  CIP  be  used  with  queuing  data  only  after  the  data  have  been  tested 
appropriately.  In  particular,  a  test  should  be  performed  to  see  If  there  Is  a 
significant  correlation  between  the  sample  mean  and  the  estimated  standard 
deviation  of  the  saiq>le  mean.  Furthermore,  the  distribution  of  the  residuals 
should  be  tested  for  normality. 
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1.  AUTOREGRESSIVE  MOVING-AVERAGE  MODEL 
The  autoregressive  moving-average  model  on  which  the  confidence-interval 
procedure  Is  based  Is  given  by  (1): 
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This  la  the  familiar  model  as  given  in  Box  and  Jenkins  fl976].  It  is  referred 
to  as  ARMA(p,q).  When  q  =  0,  the  model  is  the  pure  autoregressive  (AR) 
process  which  Fishman  [1971]  used  as  the  basis  for  a  confidence-interval 
methodology.  Here,  we  allow  for  the  presence  of  moving-average  (MA)  terms, 
thereby  extending  the  pure  AR  model  to  a  mixed  AR-MA  form  with  the  objective 
of  achieving  an  Improved  confidence-interval  methodolo^  for  those  cases  in 
which  MA  terms  are  of  Importance. 

Further  motivation  for  developing  an  ARMA-based  CIP  Is  provided  by  Steudel 
and  Wu  [1977,  p.  748],  who  state  that  "...any  uniformly  sampled  wlde-sense 
stationary  stochastic  process  can  be  adequately  described  by  a  discrete  auto¬ 
regressive  moving-average  (ARMA)  model  of  order  n  and  n-l."  On  the  basis  of 
their  limited  empirical  results,  Steudel  and  Wu  tentatively  conclude,  for 
example,  that  the  "current  system  content"  output  variable  for  an  M/M/1 
queuing  system  is  adequately  modeled  by  an  ARMA(I,0)  model.  In  a  companion 
paper,  Steudel  et  al.  [1978,  p.  292]  conclude  that  "Queue  behavior  Is  shown 
to  be  adequately  described  by  a  first  order  autoregressive  AR(1)  model  if  the 


job  selection  discipline  does  not  depend  on  operation  processing  time.  In 


those  cases  where  processing  time  was  used  In  job  selection,  a  second  order 
autoregressive  AR(2)  model  is  adequate  to  characterize  the  queues."  And 
Schmelser  and  Kang  [1981]  have  shown  analytically  that  when  batch  means  for 
any  batch  size  are  formed  for  an  AR(1)  process,  the  resulting  process  Is 
ARMA(1,1). 

The  confidence-interval  procedures  we  propose  are  for  output  processes 
which  have  reached  steady  state  or,  equivalently,  for  output  processes  which 
are  stationary.  Because  the  random  disturbances,  e,  in  (1)  are  assumed  to  be 
normally  distributed,  we  can  equivalently  consider  the  output  process  to  be 
second-order  stationary.  In  theory,  discrete-time-parameter  stationary  pro¬ 
cesses  can  be  adequately  described  by  an  ARMA(p,q)  model  if  the  p  and  q  values 
are  allowed  to  be  appropriate  nonnegative  integers  (Cox  and  Miller  [1965, 
p.  288]).  As  emphasized  by  the  concept  of  parsimony  in  the  time  series  liter¬ 
ature  (Box  and  Jenkins  [1976,  p.  17]),  small  values  for  p  and  q  can  adequately 
fit  a  given  data  set  in  most  situations. 

The  ARMA  model  in  (1)  has  the  following  properties.  The  mean  of  the 
process  Is  given  by 


The  variance  of  the  process  is  given  by 

oj  -  al  R(5,e). 

The  specific  form  of  the  function  R  depends  on  the  order  (p,q).  For  example, 
the  Yiile-Walker  equations  (Box  and  Jenkins  [1976,  p.  75])  can  be  used  to  show 
that  for  an  ARMA(1,1)  model, 

al  -  (1  +  ej  -  2«^0^)/(l  -  <^J). 
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The  spectral  density  function  for  the  general  ARMA(p,q)  model  (Fuller 
[1976,  p.  146])  Is  given  by  (2); 

f((o)  -  a2(2Tr)“^(l  -  1  0,  ^  ^  (2) 

e  j-1  J  j-1  ^ 

-n  £  u  ^  ir. 

As  will  be  seen  In  Section  2,  the  spectral  density  plays  an  Integral  role  in 
estimating  the  variance  of  the  sample  mean. 

In  most  ARMA  modeling  applications,  the  objective  is  to  find  an  adequate 
representation  of  the  data  under  investigation.  Ihe  procedures  suggested  in 
Box  and  Jenkins  [1976]  for  finding  an  adequate  ARMA  model  Involve  the  well- 
kncwn  steps  of  Identification,  estimation,  and  diagnostic  checking.  The  order 
of  the  resulting  model  and  the  estimated  parameters  are  of  central  importance. 
The  fitted  model  is  then  used  as  a  surrogate  for  the  actual  process,  and 
provides  a  basis  for  forecasting. 

This  contrasts  with  our  situation,  in  which  fitting  an  ARMA  model  is  a 
means  to  the  end  of  forming  an  interval  estimate  on  the  mean  of  a  stationary 
simulation  output  process.  Of  course,  the  Important  steps  of  Identification, 
estimation,  and  diagnostic  checking  must  be  carried  out,  but  the  resulting 
model  order  and  parameter  values  are  not  the  end  result;  the  success  of  our 
overall  procedure  will  be  judged  principally  by  characteristics  of  the  confi¬ 
dence  Intervals  which  are  ultimately  produced. 

2.  CONFIDENCE-INTERVAL  METHODOLOGY 

The  flowchart  in  Figure  1  displays  the  six  key  steps  Involved  in  applying 
the  ARMA-based  procedures  fj»'  building  a  confidence  Interval.  The  overall 
objective  of  the  first  five  steps  is  to  estimate  the  variance  of  the  sample 
mean.  These  five  steps,  taken  together,  correspond  to  Step  b  in  the 
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Observatlons  (X^,  X2. • -X^) 


VariIX„]  Var2[Xnl 

(Method  1)  (Method  2) 


Determine  the  Degrees  of  Freedom 
and 

Compute  the  Confidence  Intervals 


Confidence  Interval  Confidence  Interval 


(Method  1) 


(Method  2) 


Figure  1:  Overview  of  the  Steps 
Involved  In  the  ARMA-Based  CIn 
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I 

i 


-7- 


Introductlon.  The  sixth  step  corresponds  to  Steps  c  and  d  In  the  Introduction. 
Detailed  commentary  on  these  steps  follows. 

Step  1.  Compute  the  Sample  Autocorrelations 

For  a  sequence  of  n  observations  (X^,  X2,...,X^),  the  first  50  sample 
autocorrelations  are  calculated.  Ihe  sanple  autocorrelation  of  lag  s  is  given 
by 


n-i 

i-1 


i  (X  -  X)^ 
1-1  ^ 


2}...^ 50. 


Step  2.  Identify  the  Candidate  ARMA  Order(s) 

The  methodology  of  Box  and  Jenkins  [1976]  for  Identifying  the  order  of  an 
ARMA  process  entails  a  visual  inspection  of  the  autocorrelation  function  (ACF) 
and  the  partial  autocorrelation  function  (PACF)  estimated  from  the  data. 

This  Inspection  Involves  a  subjective,  time-consuming  procedure  which  it  would 
be  desirable  to  automate.  Recently,  several  algorithms  for  automating  the 
identification  step  have  been  proposed  (Gray  et  al.  [1978];  Beguin  et  al. 
[1980];  Tlao  and  Tsay  [1981]).  We  use  the  algorithm  proposed  by  Gray  et  al. 

We  apply  the  algorithm  by  using  the  50  sample  autocorrelations  from  Step  1 
above  to  compute  what  Gray  et  al.  term  a  D  statistic.  We  compute  this  D 
statistic  for  each  of  12  ARMA  orders  corresponding  to  all  combinations  of  p 
and  q  for  p  -  1,  2,  and  3  and  q  -  0,  1,  2,  and  3.  IVo  sets,  each  consisting 
of  12  D  statistics,  are  computed:  one  for  what  is  called  the  high-frequency 
case,  and  the  other  for  what  is  called  the  low-frequency  case.  The  (p,q) 
combination  resulting  in  the  largest  D  statistic  for  the  high-frequency  case 
is  then  a  candidate  model,  as  is  the  (p,q}  combination  corresponding  to  the 
largest  D  statistic  for  the  low-frequency  case. 
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Step  3.  Estimate  the  Parameters  of  the  Candidate  ARMA  Model(8) 

In  this  step,  the  p  autoregressive  and  the  q  moving-average  coefficients 

2 

along  with  the  variance  of  the  disturbance  term,  o^,  are  estimated  for  the 
two  candidate  ARMA  models.  If  both  candidate  models  have  Identical  orders, 
there  Is  really  only  one  candidate  model,  and  the  estimation  process  need  be 
performed  only  once.  The  estimation  procedure  uses  subroutines  from  the 
International  Mathematical  &  Statistical  Library  (IMSL).  The  key  subroutine, 
FTMXL,  does  the  estimation  by  using  the  conditional  likelihood  method 
described  in  Box  and  Jenkins  [1976,  pp.  209-10].  (See  the  IMSL  Library 
Reference  Manual  [1980]  for  documentation.) 

Step  4.  Perform  diagnostic  Tests  on  the  Candidate  Model(s) 

In  this  step,  test  statistics  for  the  candidate  model(s)  are  computed 
and  evaluated  to  determine  whether  to  accept  a  model  as  adequately  fitting  the 
data.  Included  among  these  statistics  are  the  t  statistic  for  each  of  the  p 
autoregressive  and  q  moving-average  coefficients  in  the  model(s).  A  model 
Is  judged  unacceptable  unless  the  t  value  for  each  AR  and  MA  term  Is  at 
least  1.96.  This  cutoff  value  of  1.96  was  chosen  to  correspond  to  an  a  = 

.05  test  In  the  case  of  a  large  number  of  degrees  of  freedom. 

In  addition  to  the  coefficient  t  statistics,  the  Ljung-Box  [1978]  port- 
T  .  ..teau  statistic,  Q,  was  calculated  for  the  candidate  model(s).  The  value 
of  Q  Is  given  by 

10  - 

Q  -  n(n+2)  I  (n-k)  \  , 

k-1 

A 

where  r^^  is  the  estimated  autocorrelation  of  lag  k  of  the  residuals  of  the 

estimated  model.  We  use  10  autocorrelations  In  the  calculation  of  this  sta- 

2 

tlstic.  Q  therefore  has  an  approximate  x  distribution  with  10  -  p  -  q 
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degrees  of  freedom.  For  a  model  to  be  acceptable,  the  achieved  significance 
level  of  the  Q  statistic  must  be  greater  than  .05. 

In  the  case  of  two  candidate  ARMA  models,  It  Is  possible  that  both  models 
will  pass  the  tests  on  the  coefficient  t  statistics  and  on  the  Q  statistic. 

If  this  occurs,  we  choose  that  model  which  has  the  largest  minimum  achieved 
significance  level  on  the  coefficients.  Tl^e  purpose  of  this  procedure  Is  to 
discard  that  model  which  has  the  least  significant  parameter  estimate. 

It  is  possible  to  come  up  empty  in  Step  4  if  the  model  or  models  iden¬ 
tified  in  Step  2  and  estimated  in  Step  3  fall  to  pass  the  tests  on  the 
coefficient-  and  Q-statistlcs.  In  this  case,  and  as  shown  in  Figure  1,  we 
proceed  no  further  with  the  building  of  a  confidence  interval.  We  believe 
that  for  many  simulation  studies,  the  discarding  of  a  sequence  of  data  is  not 
serious.  In  some  data-collectlon  environments,  however,  such  an  outcome  would 
not  be  acceptable — for  example,  if  the  cost  of  generating  a  data  set  Is  unduly 
high. 


Step  5.  Estimate  the  Variance  of  the  Sample  Mean 

Once  a  model  has  been  accepted,  it  is  used  in  this  step  to  estimate  the 
variance  of  the  sample  mean.  TVo  alternative  estimation  methods  are  used  for 
this  purpose,  both  of  which  make  use  of  an  estimate  of  the  spectral  density 
function  (2)  as  an  Intermediate  step.  The  estimate  of  (2)  is  given  by 


f(a)) 


(2 


j-1  ^ 


-ia;j,2 


'  -Iwj  -2 
)  (1  -  e  , 

j-1  J 


(3) 


2 


where  6^,  and  are  the  respective  estimates  of  the  autoregressive  and 
moving-average  coefficients,  and  of  the  variance  of  the  disturbances.  We 


proceed  to  describe  the  alternative  variance-estimation  methods. 
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Method  1 

For  any  stationary  process  It  can  be  shown  that 
_  2 

VarfX  1  ■  c  a  /n  where  (4) 

‘  n'  n  X 

n-1 

c  -  1  +  2  I  (1  -  1/n)  p  . 

1*1 

2 

(See  Schmelser  [1982].)  Hence,  If  we  can  estimate  c^  and  we  can  estimate 
Var[X^].  Using  (3),  we  construct  the  autocovariance  function  (5): 

Y(8)  ■  2/q  f(a))  cos  SO)  dm;  s  =  0,  1,  (5) 

The  value  of  the  Integral  In  (5)  Is  calculated  using  Simpson's  rule,  with  the 
Integrand  evaluated  at  40  uniformly  spaced  Intervals  ranging  from  0  to  ir. 

For  1  q  +  1,  y(1)  Is  then  computed  from  the  recursive  relationship: 

Y(l)  -  !  If’,  Y(l-j)*  (6) 

j-1  J 

(See  Box  and  Jenkins  [1976,  p.  75].)  Using  (4),  we  then  have 

Var^[X^]  *  c^  Y(0)/n,  where 

n-1 

c„  -  1  +  2  I  (1  -  1/n)  t(1)/y(0). 

"  1*1 

Method  2 

If  the  spectral  density  function  of  a  stationary  time  series  Is  con¬ 
tinuous,  then 

11m  n  Var[X  ]  *  2iTf(0), 
n+w 

where  f(0)  Is  the  spectral  density  of  evaluated  at  zero  (see  Fuller  [1976], 
p.  232).  As  a  second  estimate  of  the  variance  of  the  sample  mean,  we  there¬ 


fore  use 
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Var-(X„)  -  2itf(0)/n, 
z  n 

where  from  (3),  f(0)  Is 

f(0)  -  3^  (2n)~^(l  -  5  0  (1  -  ! 

j-1  J  j-1  J 

(See  Pritsker  and  Pegden  [1979,  p.  481].) 


(7) 


Step  6.  Determine  the  Degrees  of  Freedom  and  Compute  the  Confidence  Intervals 
The  (1  -  a)100%  confidence  Interval  for  p  Is  given  by 


\±^/2.k 


with  <lenotlng  the  1  -  (a/2)  percentile  of  the  t  distribution  with  k 

degrees  of  freedom,  and 


A  A 

SD^  [X^]  ■  SQRT(Var^(X^) ] ;  i  »  1,  2  (the  two  methods). 

We  estimate  the  k  degrees  of  freedom  in  a  manner  similar  to  that  suggested  by 

Fishman  [1971).  If  we  have  a  sample  of  m  Independent  observations  from  a  dis- 

2 

tribution  with  mean  p  and  variance  o  identical  to  the  mean  and  variance  of 

X 

the  stationary  process  of  Interest,  then 

VarfX^]  «  o^/m.  (8) 

Equating  the  right-hand  sides  from  (8)  and  (4),  we  have 

n  =■  me  . 
n 

This  can  be  Interpreted  to  mean  that  in  a  degrees-of-freedom  sense,  each  in¬ 
dependent  observation  is  equivalent  to  c^  correlated  observations.  (See 
Schmelser  [1982]  for  a  further  discussion.)  We  therefore  specify  the  degrees 
of  freedom  to  be 

(n/c^)  -  p  -  q  -  1, 


(9) 


where,  In  addition  to  adjusting  the  equivalent  sample  size  by  dividing  n  by 


A 

c^,  we  also  lose  a  degree  of  freedom  for  each  estimated  autoregressive  and 

moving-average  coefficient  and  for  the  estimated  mean.  If  (9)  Is  less  than 

1,  we  set  the  degrees  of  freedom  equal  to  1.  Or,  If  c  <1,  meaning  that 

n 

n/c  >  n,  we  set  the  degrees  of  freedom  to  n  -  p  -  q  -  1. 
n 


3.  TESTING  PROCEDURE 


Eight  theoretical  output  processes  were  chosen  to  conp rehens Ively  test 
the  ARMA-based  confidence-interval  procedures,  using  fixed  retained  sample 
sizes  of  n  “  100,  200,  300,  and  400.  For  each  TOP/sample-size  combination, 
enough  replications  were  generated  to  build  100  confidence  intervals.  Each 
replication  was  produced  under  stationary  conditions;  in  addition,  the  first 
50  observations  were  deleted  from  each  replication.  Figure  2  shows  the 
replication  design  for  one  TOP  with  the  retained  sample  size  set  at  100.  In 
Figure  2,  denotes  the  1-th  observed  value 


51’** 

Si’** 


.  .X 


.  .X 


150 

2 

150 


x”  x” 


51 


discarded  retained 

observations  observations 


Enough  replications 
to  build  100  confidence 
Intervals 


Figure  2:  Replication  Design 


In  the  J-th  replication.  In  general,  more  than  100  replications  are  needed 
to  build  100  confidence  Intervals,  because  not  every  replication  results  in  a 
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statlstlcally  acceptable  ARMA  model.  The  number  of  replications  needed  to 
obtain  100  confidence  intervals,  denoted  by  N  in  Figure  2,  is  reported  in  the 
measure-of-ef fectiveness  tables  in  Section  4. 

Six  of  the  eight  TOPs  used  in  the  testing  were  tailor-made;  that  is, 
they  were  autoregressive  moving- average  processes.  This  set  of  ARMA  processes 
was  carefully  chosen  to  Include  two  pure  AR  processes  and  one  ARMA  process, 
whose  use  for  testing  purposes  has  previously  been  reported  in  the  literature, 
and  to  Include  ARMA  processes  providing  a  representative  range  of  behavior 
in  terras  of  their  autocorrelation  functions  and  their  limiting  value  of  c^. 

The  c^  value  provides  an  important  measure  of  correlation  structure,  and, 
as  discussed  above,  indicates  the  number  of  dependent  observations  equivalent 
to  one  independent  observation. 

We  proceed  to  discuss  each  of  the  eight  TOPs  individually,  providing 
the  rationale  for  their  choice  and  describing  their  relevant  properties. 

TOP  1 

The  equation  for  this  pure  autoregressive  TOP  is 

Xj.  =■  •5X^_^  +  .5  +  e^.  Ej.  ~  N(0,1).  (10) 

This  is  the  first  of  two  autoregressive  TOPs  used  by  Fishman  [1971]  to  eval¬ 
uate  his  proposed  AR-based  confidence-interval  procedure.  We  include  it  here 
to  support  comparisons  between  the  AR-  and  ARMA-based  CIFS. 

For  (10),  E(X^]  ■  1,  SD[X^]  »  / 4/3,  and  SD[X^]  «  Z/ZiT.  Furthermore, 
for  any  stationary  AR(1)  process  it  can  be  shown  that 

Lira  c  -  (1  -  ^J)/(l  -  -^  )^,  (11) 

n+ao 
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where  Is  the  autoregressive  coefficient.  Substituting  the  *0.5  from 
(10)  into  (11),  which  can  be  thought  of  as  a  global  measure  of  the  de¬ 

gree  of  dependence  Inherent  in  this  model  when  estimating  Var  (X^)* 

The  ACF  for  an  AR(1)  process  is  given  by 

-  0.5^  i  >  1. 

The  ACF  for  the  process  in  (10)  is  consequently  always  positive,  and  decays 
exponentially.  In  our  experience,  such  ACF  behavior  is  representative  of  out¬ 
puts  from  queuing  system  slnulations. 

TOP  2 

This  TOP,  also  purely  autoregressive  and  specified  as 
\  -  .5  -  .25  +  *5  +  ~  N(0,1),  (12) 

is  the  second  of  the  two  AR  TOft  used  by  Fishman  [1971].  For  (12),  E[X^]  » 
2/3,  SD[X^]  ■  1.13,  and  SD[X^]  «  1.31//n.  For  a  stationary  AR(2)  process, 
we  have 

i'i®  *  (1  +  +  <('^'('2  “  "  4’2)(1  "  “  <>2)* 

n-*’* 

where  and  ^2  autoregressive  coefficients.  Substituting  the  ap¬ 

propriate  values  from  (12)  into  (13),  c^  ♦  1.4  for  this  process. 

The  ACF  for  an  AR(2)  process  is  given  by 

-  0.4 

*0.5  “  0.25  Pj^_2  1  “  2,  3,  4,.... 

This  ACF  is  of  a  damped  sinusoidal  form.  Although  in  our  experience  this 


form  is  not  typical  of  outputs  from  queuing  system  simulations,  we  Include 
this  process  to  support  coqparlson  to  the  greatest  extent  possible  with 
Fishman's  [1971]  work. 
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TOP  3 

This  AR(1)  TOP  is  given  by 

Xj.  -  .8  +  200  +  Gj.  ~  N(0,3600).  (14) 

This  TOP  was  selected  for  test  purposes  because  Steudel  and  Wu  [1977]  Indicate 
that  the  behavior  of  an  M/M/1  queuing  system  having  a  high  server  utilization 
can  be  modeled  by  an  AR(1)  process  for  which  approaches  1.  For  the  pro¬ 
cess  In  (14),  E[X^]  -  1,000,  SD[X^]  *  100,  and  SD[X^J  «■  300//n.  Substi¬ 
tuting  0.8  from  (14)  into  (11)  indlcat  that  c^  +  9  for  this  process.  And, 
as  for  TOP  1,  the  ACF  decays  exponentially. 

TOP  4 

This  ARMA  TOP,  which  Is  the  first  of  three  mixed  ARMA  models  used  for 
testing  purposes  here.  Is  given  by 

X^  -  -»■  300  +  ~  N(0,2965.1).  (15) 

For  this  process,  E[X^]  -  1,000,  SOfR^.]  -  100,  and  SD[X^]  •  254//n.  For 
any  stationary  ARMA(1,1)  process, 

(1  -  4..0.)(.^.  -  e  ) 

Lim  c  -1  +  2  - — - - - -  (16) 

"  (1  -  <>j)(i  +  e{  - 

Using  the  coefficients  from  (15)  in  (16),  c^  ♦  6.46  for  this  process. 

The  AGP  for  this  ARMA(1,1)  process  has 

-  0.8186,  and 

-  0.7  for  1  “  2,  3,  4,.... 

This  ACF  is  always  positive,  and  decays  exponentially.  As  mentioned  above, 
our  experience  Indicates  that  output  from  queuing  system  simulations  often 


exhibits  such  behavior. 
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TOP  5 

This  ARMA(2,1)  model,  given  by 

-  1. 32X^  ,  -.68X^  .  +  360  +  -  .8e^  , 

t  t-1  t-z  t  t-i 

~N{0,  5373.1),  (17) 

was  used  by  Gray  et  al.  11978]  to  test  their  algorithm  for  automatic  Identi¬ 
fication  of  an  ARMA  process.  We  use  it  here  to  support  comparison  with  their 
results  In  terms  of  the  ability  of  their  algorithm  to  correctly  identify  an 
ARMA  process. 

For  the  process  in  (17),  E(X^]  -  1,000,  SD[X^]  -  100,  and  SD[X^] 

»  40//n. 

For  a  stationary  A8MA(2,1)  process  we  have 


Llm  c 


(l-^^-4>2)(2<^je^+(^2+'J>2®r®r^^ 


Using  values  from  (17)  in  (18),  c^  ♦  0.16  for  this  process.  This  c^  <  1 
Indicates  that  the  variance  of  the  sample  mean  for  this  correlated  process  will 


be  smaller  than  the  variance  of  the  sample  mean  for  an  independent  process 


with  the  same  underlying  variance.  This  might  contribute  to  the  notable 


ability  of  Gray  et  al.  's  algorithm  to  correctly  identify  data  generated  by 
(17)  as  coming  from  an  ARMA(2,1)  process  (as  reported  by  Gray  et  al. ,  and  as 
substantiated  here  In  Section  5). 


The  ACF  for  this  ABMA(2,1)  process  has 
■  0.5299,  and 

Pj  ■  1.32  ”  0.68  Pj_2  1  ■  2,  3,  4,.... 

The  ACF  consequently  shews  damped  sinusoidal  behavior. 
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TOP  6 

This  ARMA(2,1)  model  Is  given  by 


Cj.  ~  N(0,  1271.  5). 


(19) 


This  process  has  E[X^]  -  1,000,  SD[X^1  -  100,  and  SD[X^]  »  242/ /n.  Using 
values  from  (19)  In  (18),  c^  +  5.85  for  this  process. 

The  ACF  is  given  by 
*  0.8597,  and 

-  0.9  ~  0.18  P^_2  1  *  2,  3,  4,.... 

This  ACF  is  always  positive  and  decreases  exponentially. 

With  its  c^  >  1  and  its  ACF  properties,  TOP  6  was  designed  to  contrast 
with  TOP  5.  We  believe  that  In  these  measures  TOP  6  corresponds  more  closely 
to  typical  queuing  system  slnulatlon  output  than  does  TOP  5. 


TOP  7 

TOIh  7  and  8  are  based  on  output  processes  associated  with  queuing 
system  simulations.  Ihese  two  TOIb  were  chosen  to  Investigate  the  potential 
applicability  of  the  ARMA-based  CIP  to  the  queuing  system  class  of  non- 
ARMA  TOlh . 

TOP  7  Is  based  on  a  materlals-handllng  problem  described  In  Hilller  and 

Lleberman  [1974,  p.  465],  In  which  the  output  random  variable  is  cost  per  unit 

time.  Here  Is  a  paraphrased  statement  of  the  problem: 

"A  certain  materlals-handllng  unit  Is  used  to 
transport  goods  between  producing  centers  In  a 
Job  shop.  Calls  for  the  materlals-handllng  unit 
to  move  a  load  come  essentially  at  random 
(l.e.  ,  according  to  a  Poisson  Input  process) 


at  a  mean  rate  of  two  per  hour.  The  total  time 
required  to  move  a  load  has  an  exponential 
distribution  with  an  expected  time  of  d  minutes. 

The  total  equivalent  uniform  hourly  cost 
(capital  recovery  cost,  plus  operating  cost) 
for  the  raaterlals-handllng  unit  Is  PC.  The 
estimated  cost  of  Idle  goods  (waiting  to  he 
moved,  or  In  transit)  because  of  Increased 
In-process  Inventory  Is  $10  per  load  per 
hour.  Furthermore,  the  scheduling  of  the 
work  at  the  producing  center  allows  for 
lust  one  hour  from  the  completion  of  a  load 
at  one  center  to  the  arrival  of  that  load  at 
the  next  center.  Therefore,  an  additional  S20 
per  load  per  hour  of  delay  (Including  transit 
time)  after  the  first  hour  Is  to  be  charged 
for  lost  production.  What  Is  the  expected 
cost  of  this -system  (defined  as  the  sum  of 
delay  cost  and  equipment  cost)  as  a  function 
of  d  and  FC?" 

We  slnulate  the  behavior  of  this  system,  setting  specific  values  for  d  and  FC 
and  taking  periodic  observations  on  the  cost  accumulated  by  the  system  over 
time.  letting  TC  be  the  cost  per  hour,  it  can  be  shown  that  for  d  <  30, 

EfTC]  «■  {2d/(60  -  2d)}  {20  exp  ((2d  -  60) /d)  +  10}  +  FC. 

Hence,  the  performance  of  the  ARMA-based  confidence-interval  procedure  can  be 
measured  In  terms  of  the  known  mean  of  the  cost  variable. 

One  key  decision  In  using  this  system  as  a  test  case  Involves  setting 
the  Interobservation  time.  We  set  this  value  at  eight  siwlated  hours  (one 
shift),  which  Is  16  times  the  expected  job  Interarrlval  time.  By  way  of 
conparlson,  Fishman  [1071]  chose  an  Interobservation  time  4  times  the  Inter¬ 
arrival  time  to  observe  current  queue  content  in  an  M/M/1  system  used  to  test 
his  AR-based  confidence-interval  procedure.  In  contrast,  Steudel  and  Wu 
[1977]  recommend  that  an  Interobservation  time  10  times  greater  than  the 
service  time  be  used  In  observing  current  queue  content  in  job-shop  simula¬ 
tions.  In  general,  no  conp rehens Ive  guidelines  have  been  reported  for 
choosing  the  Interobservation  time  In  experiments  of  this  type. 


-19- 


Another  key  decision  In  this  system  Involves  selecting  a  method  to  follow 
In  accunulating  the  system  cost.  In  one  method,  the  cost  could  be  based  only 
on  those  jobs  which  have  left  the  system  during  the  current  observation 
period.  In  another  method,  the  cost  could  he  based  on  all  jobs  which  have 
been,  and  perhaps  still  are,  In  the  system  at  any  time  during  the  course  of 
the  current  observation  period.  The  expected  total  cost  per  unit  time  will  be 
the  same  for  either  method,  but  the  variability  of  the  cost  will  not.  Vte  use 
the  second  of  these  two  cost-accumulation  methods  because  the  variability 
associated  with  It  Is  smaller. 

In  setting  parameter  values  for  this  TOP,  we  chose  d  »  24  minutes  and 
FC  ■  $10.  This  results  in  a  server  utilization  of  0. P  and  leads  to  an  ex¬ 
pected  dally  system  cost  of  $788.18. 

TOP  8 

For  TOP  8  we  worked  with  an  M/D/3  queuing  system,  choosing  current  system 
content  as  the  output  random  variable  of  interest.  Analytic  solutions  for 
this  system  have  been  evaluated  numerically  by  Wlllier  and  Yu  fl981j,  making 
It  easy  to  assess  performance  of  the  ARMA-based  confidence-interval  procedure 
In  terms  of  the  known  system  properties.  By  way  of  contrast'  with  TOP  7,  the 
output  random  variable  chosen  here  takes  on  a  noncumulatlve  value;  that  Is, 
the  value  observed  Is  a  point  value,  not  a  value  accumulated  during  the  course 
of  the  observation  period. 

In  the  M/D/3  system,  Interarrlval  time  was  set  to  5  minutes,  service 
time  to  13.5  minutes,  and  Interobservation  time  to  20  minutes.  The  Inter¬ 
arrlval  and  service-time  settings  result  In  an  0.9  server  utilization,  and 
give  an  expected  system  content  of  6.42. 


-20- 


4.  TESTING  ENVIRONMENT 

The  software  used  In  this  research  consisted  of  custom-built  modules 
combined  with  proven  existing  routines.  The  existing  routines  Included 
certain  IMSL  [1980]  subroutines  and  the  Michigan  Interactive  Data  A'  alysis 
System  (Fox  and  Ghlre  [1976]).  With  the  one  exception  noted  below,  the 
custom-built  modules  were  written  in  FORTRAN,  were  checked  out  under  an 
Interactive  FORTRAN  Interpreter,  and  then  were  translated  under  an  optimizing 
FORTRAN  compiler  prior  to  their  use  In  making  the  production  runs. 

The  two  queuing-system  TOIh  were  built  In  GPSS.  The  GPSS  models  were 
run  under  GPSS/H  (Henrlksen  and  Crain  [1982]),  a  state-of-the-art  GPSS 
Implementation.  GPSS/H  uses  the  Tausworthe  [1965]  algorithm  to  generate 
uniform  0-1  random  numbers.  The  exponentially  distributed  Interarrlval  and 
service  times  In  the  GPSS  models  were  sampled  using  the  standard  natural 
logarithm  function  from  the  FORTRAN  library.  This  Is  superior  to  the  more 
conventional  GPSS  approach  of  using  a  piecewise  linear  approximation  to  the 
inverse  cdf  of  the  exponential  distribution  (see  Schrlber  [1974,  p.  163]). 

The  conputlng  work  was  accomplished  on  an  Amdahl  470/V8  operating  under 
the  Michigan  Terminal  System  at  The  University  of  Michigan. 

5.  TEST  RESULTS 

Results  of  using  the  two  versions  of  the  ARMA-based  confidence-interval 
procedure  with  the  eight  theoretical  output  processes  are  presented  here  In  a 
set  of  eight  Identically  formatted  tables,  following  the  suggestion  of  Schrlber 
and  Andrews  [1981].  The  four  table  rows  correspond  to  100  replications 
consisting  of  100,  200,  300,  and  400  observations,  respectively.  Each  of  the 
five  table  coltmns  corresponds  to  a  particular  measure  of  effectiveness  of  the 
confidence-interval  procedure.  These  five  MOBs  will  be  described  before  the 
tables  themselves  are  presented  and  discussed. 


-21- 


For  the  six  ABMA  TOFs,  table  column  1  (MOE  1)  reports  the  percentage  of 
accepted  ARMA  models  whose  (p,q)  order  matched  that  of  the  known  underlying 
ARMA  process.  Recall  that  a  candidate  ARMA  model  was  accepted  only  If  various 
test  statistics  had  satisfactory  values.  Also  note  that  because  each  MOE  1 
percentage  Is  based  on  100  replications,  the  percent  can  alternatively  be 
thought  of  as  an  actual  count.  This  MOE  measures  the  ability  of  the  Gray 
et  al.  algorithm  to  correctly  Identify  the  order  of  the  ARMA  process  used  to 
generate  the  time  series  being  analyzed.  MOE  1  does  not  have  an  Interpreta¬ 
tion  for  queuing-system  TOft  7  and  8,  and  so  is  marked  NA  (not  applicable) 
in  Tables  VII  and  VIII.  The  orders  of  the  ARMA  models  accepted  for  the 
queuing-system  TOft  are  of  Interest,  however,  and  will  be  presented  sepa¬ 
rately  when  test  results  for  those  TOBs  are  discussed. 

Table  column  2  (MOE  2)  provides  a  measure  of  the  coverage  properties  of 

the  confidence  Intervals  built  for  the  accepted  ARMA  models.  In  particular, 

2 

the  number  reported  in  column  2  is  the  achieved  significance  level  of  a  x 
test  for  uniformity  In  the  distribution  of  the  random  variable 
n*  »  inf{n:6cC(X^,  X2,  ...,  X^;  n)}, 

where  6  «  the  process  parameter  of  interest, 
n  >  a  confidence  level,  and 

C(X^,  X2...X^;n)  ■  a  confidence  Interval  based  on  the  sequence 
^1’  ^2’'’'’^n  confidence  level  n. 

The  random  variable  n*  Is  the  confidence  level  that  Just  succeeds  In  covering 
the  parameter  of  Interest,  which  In  our  case  Is  the  process  mean.  The  dis¬ 
tribution  of  n*  Is  referred  to  as  the  coverage  function.  For  a  theoreti¬ 
cally  perfect  confidence-interval  procedure,  n*  follows  a  uniform  (0,1) 
distribution.  (See  Schruben  [1980]  for  details.) 
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2 

We  conducted  the  x  goodness-of-f it  test  by  dividing  the  (0,1)  interval 
into  10  cells  of  equal  width  and  then  computing  the  corresponding  test 
statistic.  Tjow  values  of  the  achieved  significance  level  of  this  statistic 
would  indicate  that  the  observed  n*'s  do  not  conform  to  the  theoretically 
correct  uniform  (0,1)  distribution.  Tn  particular,  any  value  lower  than,  say, 
0.05  suggests  that  the  CIP  is  suspect  in  terms  of  its  ability  to  produce 
meaningful  confidence  Intervals  for  the  TOP  at  hand. 

Table  column  3  (MOE  3)  provides  a  measure  of  the  degree  of  variability 
in  the  halfwidths  of  the  confidence  Intervals.  In  particular,  it  reports  the 
estimated  coefficient  of  variation  (CV)  of  the  standard  error  of  the  mean. 

This  estimate  is  computed  as  follows: 

CV  -  SD  fsn(x^)]/sn(x^), 

where 


1  ? 

SDfSn(X^)]  =  99  '  "  SO(X^))^; 

and  SDj(X^)  is  the  standard  error  on  the  jth  replication.  For  ild  observa¬ 
tions  taken  from  a  normal  distribution,  Schmeiser  [1982]  derived  CV 
analytically,  one  form  of  which  is 

CV  -  {tr(B±i)i^-  -  [r(|)i2}^'V(J). 

Note  that  CV  in  this  case  depends  only  on  the  sample  size,  n.  For  iid  normal 
samples  of  size  100,  200,  300,  and  400,  the  corresponding  CV  values  would  he 
0.071,  0.050,  0.041,  and  0.035.  These  numbers  provide  a  benchmark  for  MOE  3. 
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Table  column  4  (MOE  4)  provides  the  two  conventional  measures  for 
reporting  the  properties  of  a  confidence-interval  procedure.  This  column 
indicates  the  average  relative  halfwldth  of  the  confidence  intervals  built  at 
a  95%  confidence  level,  and  the  percentage  of  these  intervals  which  cover  the 
process  mean. 

Table  column  5  (MOE  5)  Indicates  how  many  replications  had  to  be  generated 
to 'obtain  100  usable  replications.  A  usable  replication  is  one  to  which  an 
ARMA  model  can  be  fitted  acceptably  in  the  statistical  sense  described  in 
Section  3.  Uhls  measure,  which  involves  the  ability  of  the  ARMA-based  CIP  to 
produce  a  confidence  Interval  for  the  TOP  at  hand,  is  reported  as  the  ratio  of 
replications  generated  to  replications  used. 

In  examining  the  MOE  tables,  the  following  points  should  be  kept  in  mind: 

1.  How  do  the  two  versions  of  the  ARMA-based  CIP  cotipare? 

(Only  MOEs  2,  3,  and  4  will  depend  on  the  version  in  question.) 

2.  How  adequately  do  the  two  Gift  perform  for  the  TOP  used? 

3.  Do  the  properties  of  the  CIP  as  measured  by  the  MOEs  Improve 
as  sample  size  Increases? 

4.  If  one  or  more  table  values  are  not  what  we  would  expect, 
can  we  identify  the  underlying  cause  and  their  implications? 

Tables  I  and  II  give  the  results  of  analyzing  the  AR(1)  and  AR(2)  models 
used  by  Fishman  [1971].  In  the  AR(1)  case,  83,  88,  86,  and  98  percent  of  the 
accepted  ARNA  models  were  correctly  identified  to  be  of  (1,0)  order  for 
replications  consisting  of  100,  200,  300,  and  400  observations,  respectively 
(column  1,  Table  I).  This  indicates  good  performance  on  the  part  of  the 
identification  algorithm  for  this  AR(1)  TOP.  For  the  AR(2)  case,  the  percen¬ 
tages  of  accepted  ARMA  models  which  matched  the  underlying  (2,0)  order  were 
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also  In  the  80%  range  except  for  satnple  size  100  (column  1,  Table  II). 

To  obtain  100  acceptable  ARMA  models,  we  needed  at  most  147  replications  for 
the  AR(1)  TOP  (column  5,  Table  I),  but  as  many  as  202  replications  from  the 
AR(2)  TOP  were  needed  to  obtain  100  acceptable  models  (column  5,  Table  II). 

The  achieved  significance  levels  of  the  coverage  function  (column  2) 
are  consistently  excellent  In  Table  I,  and  are  completely  satlsfatory  in  Table 
II  for  samples  of  size  200,  300,  and  400.  For  samples  of  size  100  In  Table 
II,  however,  the  hypothesis  regarding  uniformity  of  n*  would  be  rejected  at 
a  0.05  significance  level.  Note  that  this  is  also  the  sample  size  In  Table  II 
for  which  the  relatively  small  percent  of  correct  identification  and 
relatively  large  replication  ratio  were  experienced. 

The  entries  In  column  3  (MOE  3)  In  Tables  I  and  II  provide  a  measure  of 
the  variability  In  the  width  of  the  confidence  intervals.  The  column  3  values 
are  larger  than  the  lid-normal  benchmark  values  reported  above,  which  might 
reflect  the  dependency  in  the  data.  like  the  benchmark  values,  the  column  3 
values  decrease  as  sample  size  Increases.  R>r  TOBs  1  and  2,  and  for  the 
other  six  TOft  as  well,  the  variability  of  the  confidence  Interval  width  Is 
larger  for  Method  2  than  for  Method  1.  In  all  cases,  however,  the  differences 
are  small. 

Column  4(b)  In  Tables  I  and  II  indicates  that  the  coverage  rate  of  con¬ 
fidence  Intervals  at  a  95%  confidence  level  was  close  to  0.95  in  most  cases. 
The  entries  In  column  4(a)  report  the  average  relative  halfwidth.  This 
measure  Is  useful  when  it  is  of  Interest  to  estimate  the  sample  size  required 
to  achieve  a  specified  relative  halfwldth  in  a  confidence  interval.  Ihese 
values  range  from  .19  in  Table  II  for  a  sample  size  of  400,  to  .409  in  Table  I 
for  a  st..4>le  size  of  100. 
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The  only  measure  which  can  be  conpared  with  the  results  reported  by 
Fishman  (1971]  Is  MOE  1.  For  these  two  models,  Fishman  reported  correct 
Identification  In  77%  to  86%  of  the  cases.  These  percentages  are  not  directly 
conparable  with  MOE  1  here  because  our  percentages  do  not  Include  those  repli¬ 
cations  for  which  no  statistically  acceptable  ARMA  model  could  be  fitted. 
Furthermore,  Fishman  used  a  variable  sample  size  scheme,  extending  the  size 
as  necessary  to  achieve  a  stated  relative  halfwldth.  His  sample  sizes  ranged 
from  about  250  to  350.  Fishman  did  not  report  coverage  measures  for  the  AR(1) 
and  AR(2)  models,  so  coverage  coiqparlsons  cannot  be  made. 

Table  III  reports  results  for  the  AR(1)  model  used  as  TOP  3.  The  table 
contains  superb  values  for  all  measures.  In  terms  of  Identification,  the 
procedure  did  better  with  this  process  than  It  did  with  the  AR(1)  process 
reported  In  Table  I.  TOP  3  had  ((ij^  =  0.8,  as  conpared  with  0.5  for  TOP 
1.  The  corresponding  limiting  values  of  c^  were  9  and  3.  It  would  therefore 
seem  that  the  Gray  et  al.  algorithm  correctly  Identifies  the  underlying  order 
a  higher  percent  of  the  time  when  the  correlation  structure  as  measured  by 
the  limiting  value  of  Is  stronger. 

Table  IV  corresponds  to  TOP  4,  which  is  a  mixed  ARMA(1,1)  model.  The 

Table  IV  results  are  excellent  except  for  the  small  percent  of  correctly 

Identified  models  and  the  low  achieved  significance  level  for  n*  at  sanple 

size  100.  It  Is  worthwhile  to  attempt  to  explain  why  the  coverage  at  a  *^5% 

confidence  level  Is  satisfactory  when  n  »  100,  whereas  the  small  achieved 

significance  level  for  MOE  2  at  the  same  sample  size  Indicates  that  the 

coverage  was  not  satisfactory  for  all  confidence  levels.  One  possible  cause 

Is  that  there  may  be  a  significant  correlation  between  the  estimate  of  the 

mean,  X  ,  and  the  estimated  standard  deviation  of  X  .  If  this  correlation 
n  n 

exists,  then  the  numerator  and  denominator  of  a  statistic  assumed  to  follow 


Measures  of  Effectiveness  for  Analysis  of  an  ARMA  (1,0)  Model 


-2P- 


the  t  distribution  violate  the  assmptlon  that  they  are  Independent.  This  Is 

one  of  the  potential  problens  which  Schruben  [l^RO]  Indicates  can  lead  to  poor 

performance  of  the  empirical  coverage  function.  However,  this  is  not  a 

problem  with  this  ARMA(1,1)  process.  For  the  100  usable  replications  at 

sample  size  100,  the  achieved  correlation  between  X  and  the  estimated  standard 

n 

deviation  of  was  0.15  for  both  methods  of  estimating  the  variance  of  the 
sanple  mean.  At  a  significance  level  of  .05,  the  critical  value  of  the 
correlation  coefficient  Is  .20,  so  the  hypothesis  that  these  statistics  are 
uncorrelated  would  be  accepted.  The  poor  realization  of  the  coverage  function 
Is  traceable  to  the  fact  that  21  (20  in  the  case  of  Method  2)  of  the  100 
confidence  Intervals  had  an  achieved  n*  between  0.60  and  0.70.  Tbls  may  be 
attributable  to  unexplained  randomness. 


Measures  of  Effectiveness  for  Analysis  of  an  ARMA  (1,1)  Model 


-11- 


Tables  V  and  VI  correspond  to  TOft  5  and  6,  both  of  which  are  ARMA(2,1) 
models-  Ihree  points  should  be  kept  in  mind  when  examining  these  tables: 

1.  In  Table  V,  the  alternative  methods  for  estimating 
Var[X^]  give  discernibly  different  results  for  the 
first  time; 

2.  The  percent  of  correct  identifications  in  Table  VI 
is  low  for  the  first  time;  and 

3.  The  values  of  the  average  relative  halfwidth  are 
extremely  small,  which  merits  comment. 

In  Table  V,  at  sample  sizes  of  100  and  200,  MOE  2  reports  low  values  for 
Method  2  but  acceptable  values  for  Method  1-  It  should  be  noted  that  Method  1 
uses  equations  (5)  and  (6)  to  smooth  out  the  autocorrelation  function.  This 
results  in  better  empirical  autocorrelation  properties  for  this  particular 
theoretical  output  process. 

The  inferior  results  for  Method  2  at  sample  sizes  of  100  and  200  can  be 
explained  in  terms  of  the  values  calculated  for  Var^fX^].  From  (7), 


«2  V  ■'  2 

-  I  ,  1/2 

L  ^  «  '2^ 

n  [1  -  I  4>1 


The  resulting  estimated  standard  deviation  may  be  very  large  if  ^  “  1  or 
it  may  be  very  small  if  0  <•  1.  For  the  replications  with  sample  sizes  of 


100  and  200,  small  values  for  SD^fX^l  occurred  quite  often.  The  following 


list  gives  some  of  the  cases  for  which  was  very  small  con|>ared  to 

so^rxj. 


32 
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n* 


SDi[XJ  SD^IX^] 


2.23 

0.91 

2.58 

1.31 

2.83 

0.72 

2.03 

0.72 

2.51 

0.94 

1.29 

0.22 

Method  1  Method  2 


1.0 

1.0 

0.92 

1.0 

0.27 

0.83 

0.81 

1.0 

0.26 

0.63 

0.20 

0.85 

As  can  be  seen  in  these  examples,  the  small  values  of  SD^tX^]  yield  large 
values  of  n*  which,  in  turn,  result  in  an  empirical  coverage  function  not 
conforming  to  uniform  (0,1).  As  further  evidence  of  this  point,  the  average 
value  of  SD^[X^]  for  the  100  replications  was  5.16  for  sample  size  100  (2.97 
for  sample  size  200),  whereas  for  SD^fX^]  it  was  4.56  for  sample  size  100  (2.73 
for  sample  size  200).  This  suggests  that  Var2[X^]  is  underestimating  Var[X^]. 
At  sample  sizes  of  300  and  400  the  average  SD2[X^]  was  again  smaller  than  the 
average  SDj^fX^];  however,  the  difference  was  not  sufficient  to  degrade  the 
coverage  function  because  the  extreme  lower  tall  values  were  not  persistent. 

For  example,  of  the  100  replications  at  sample  size  100,  no  SD^[X^]  values 
were  smaller  than  2.0;  however,  12  of  the  SD2[X^]  were  smaller  than  2.0. 

For  sample  size  200,  no  SDj[X^]  values  were  smaller  than  1.2,  but  7  of  the 
®^2^Xjjl  values  were.  This  explains  the  poor  performance  of  the  coverage 
functions  for  Method  2  at  sample  sizes  of  100  and  200. 

All  of  the  reported  measures  of  effectiveness  in  Table  VI  are  adequate 
except  for  MOE  1,  where  the  percent  of  correctly  identified  ARMA  models  was 
very  low  at  all  sample  sizes.  An  explanation  for  this  mlsidentif icatlon  lies 
with  the  choice  of  ^2»  which  for  this  process  has  a  value  of  -0.18.  This 
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relatlvely  small  value  results  In  data  adequately  fitted  by  ARMA(1,1)  models. 
In  fact,  of  the  100  fitted  ARMA  models  for  each  of  the  four  sample  sizes,  62, 
66,  62,  and  63,  respectively,  were  of  (1,1)  order.  This  largely  explains  why 
there  were  so  many  mlsldentlf Icatlons.  In  spite  of  these  mlsldentlf Icatlons, 
however,  the  coverage  properties  were  still  satisfactory  for  this  theoretical 
output  process. 

The  values  of  the  average  relative  half width  listed  in  column  4(a),  Table 
VI,  demonstrate  how  this  measure  decreases  with  Increasing  sample  size.  The 
very  small  values  for  this  measure  are  misleading,  however.  In  that  there  Is 
no  standard  against  which  to  compare  them.  Simply  by  changing  the  process 
mean  for  which  the  data  were  generated,  the  relative  halfwldths  can  be  changed 
without  either  Improving  or  degrading  the  confidence-interval  procedure. 
Average  relative  halfwldths  consequently  are  not  comparable  across  TOBs  having 
nonldentlcal  means.  For  example,  the  seemingly  large  values  of  this  measure 
In  Table  II  were  for  a  process  mean  of  0.6667,  whereas  the  process  mean  for 
the  Table  V  TOP  Is  1000.  It  is  important  to  be  aware  that  the  average 
relatlve-halfwidth  measure  should  only  be  used  for  comparison  purposes  as 
sample  size  changes  for  a  given  TOP. 

Tables  VII  and  VIII  report  the  results  for  queuing- system  TORs  7  and  8, 
respectively.  The  small  values  for  MOE  2  in  these  tables  Indicate  that  the 
coverage  function  for  these  TOft  does  not  conform  to  a  uniform  distribution. 
The  coverage  percentages  given  as  MOE  4(b)  also  fall  short  of  the  95%  level. 

We  will  new  discuss  three  sources  of  possible  error  which  could  account  for 
the  poor  performance  of  the  coverage  properties  with  respect  to  these  queuing 


models 
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1.  Correlation  between  sample  mean  anH  its  standard  error 

The  use  of  the  t  statistic  In  buildlnf*  a  confidence  interval  assumes  that 
and  SD[X^]  are  Independent.  The  extent  to  which  this  assumption  is  satis¬ 
fied  by  the  data  can  be  checked  by  using  the  icn  replications  at  a  given  sample 
size  to  estimate  the  correlation  between  these  two  statistics.  We  computed 
the  correlation  between  the  sample  means  and  their  standard  deviations  as 
estimated  by  Methods  1  and  2  for  all  eight  TORs .  The  resulting  correlation 
coefficients  appear  in  Table  IX. 

The  critical  value  at  a  =  .01  for  the  Table  IX  correlations  under  the 

assumption  of  bivariate  normality  is  0.26.  Inspecting  the  table,  we  see  that 

the  ARMA  TOI^  1  through  6  had  Insignificant  correlations  for  both  versions 

of  the  estimators  of  Var[X  1.  However,  there  were  significant  correlations 

n 

in  all  cases  Involving  queuing-system  TOPs  7  and  «.  This  correlation  between 
the  sample  means  and  their  estimated  standard  deviations  contributes  adversely 
to  the  behavior  of  the  coverage  function. 

2.  Distribution  of  the  disturbance  terms 

The  ARMA  model  in  (1)  assumes  that  the  disturbance  terms  are  normally 
distributed.  This  assumption,  in  turn,  forms  part  of  the  basis  for  testing 
the  statistical  acceptability  of  the  fitted  ARMA  models,  and  for  the  subse¬ 
quent  confidence-interval  methodology.  The  validity  of  this  assumption  can  be 
tested  by  investigating  the  distribution  of  the  residuals  of  the  fitted 
models. 

We  chose  four  replications,  each  of  sample  size  400,  on  which  to  check 
the  normality  assumption  for  the  residuals.  The  first  two  replications  were 
for  TOP  5,  the  ARMA(2,1)  process  for  which  these  CIP  procedures  work  well. 

The  first  replication  had  a  reported  n*  of  0.04,  and  the  second  had  a 
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reported  n*  of  1.0.  This  means  that  the  confidence  Interval  from  the  first 

replication  covered  the  true  mean  for  all  confidence  levels  at  or  above  9%. 

The  confidence  Interval  associated  with  the  second  replication  covered  only  at 

confidence  levels  approaching  100%. 

The  other  two  replications  were  chosen  from  TOP  8.  Their  respective 

n*'s  were  0.37  and  1.0.  Hence,  these  replications  had  specifications  similar 

to  those  of  the  two  replications  chosen  from  TOP  5. 

Table  X  summarizes  the  results  of  checking  the  residuals  from  the  chosen 

four  replications  for  normality  with  a  mean  of  zero.  The  table  indicates  the 

mean,  standard  deviation,  skewness,  kurtosis,  and  the  achieved  significance 
2 

level  of  the  x  goodness-of-f It  test  for  normality. 

As  expected,  the  distribution  of  the  residuals  from  the  tailor-made  TOP  5 
Is  consistent  with  the  underlying  assumption  of  normality.  Their  kurtosis 
(which  can  be  conpared  to  a  kurtosis  of  zero  for  the  normal  distribution)  and 
skewness  (which  can  be  compared  to  a  skewness  of  zero  for  the  normal)  sub¬ 
stantiate  normality,  as  does  the  achieved  significance  level  of  the  associated 
2 

X  statistics. 

However,  the  measures  of  skewness,  kurtosis,  and  goodness-of-f it  do  not 
support  the  assumption  of  normality  for  the  queulng-TOP  residuals.  The  skew¬ 
ness  and  kurtosis  are  not  close  to  zero,  and  the  small  values  of  the  achieved 
significance  levels  of  the  goodness-of-f It  tests  Indicate  a  re  lection  of  the 
normality  hypothesis.  The  lack  of  normality  in  the  residuals  Is  a  factor 
contributing  to  the  poor  coverage  encountered  when  working  with  observations 
produced  by  the  queuing-system  sinulations. 


Table  X 


Residual  Analysis 


Replication  1  Replication  2  Replication  3  Replication  4 
(TOP  5,  (TOP  5,  n*«l.  00)  (TOR  R,  n*°.  37)  (TOP  S.  n*-l-00) 


Mean 

-.641 

.516 

-.007 

-.022 

Standard 

Deviation 

76.12 

7  2.64 

1.98 

1.  98 

Skewness 

.061 

.095 

.430 

.312 

Kurtosls 

-.220 

-.009 

.372 

-.261 

Achieved 
Signifi¬ 
cance 
Level  of 
X^-Test 


22 


68 


00 


09 


3.  Aberrant  behavior  of  the  replicated  means 

A  third  possible  contributor  to  poor  coverage  Involves  situations  in 
which  achieved  sample  means  may  persistently  be  far  removed  from  the  process 
mean.  If  the  replicated  means  consistently  tend  to  be  far  removed  from  the 
process  mean,  then  this  can  result  In  poor  performance  characteristics  for  any 
confidence-interval  procedure.  The  halfwidths  needed  to  cover  the  process 
mean  tend  to  be  persistently  large  In  such  a  situation,  resulting  In  large 
values  of  n*  and  strongly  nonuniform  behavior  for  the  coverage  function. 

To  Investigate  this  possibility  here,  we  computed  the  grand  mean  for 
all  100  replications  at  each  sample  size  for  TOPS  7  and  8.  Ihen  we  evaluated 
MOE  2  and  4(b)  for  each  combination  to  determine  the  extent  to  which  the  ARMA- 
based  confidence  Intervals  covered  these  grand  means.  In  essence,  we  were 
adjusting  for  the  bias  In  the  generating  process.  The  MOE  2  and  4(b)  coverage 
properties  Improved  slightly  but  were  still  unsatisfactory  except  for  sample 
size  400  with  TOP  7.  Me  conclude  that  the  generation  process  was  not  the 
cause  of  poor  coverage  with  the  queuing  TOI%. 

The  final  aspect  of  the  test  results  involves  the  order  of  the  fitted 
ARMA  models.  The  counts  of  the  achieved  orders  for  TORs  7  and  fl  are  shown 
in  Table  XI.  Table  rows  indicate  the  number  of  observations  per  replication; 
and  table  columns  show  the  orders  of  the  models,  arranged  by  Increasing  sum 
of  the  autoregressive  and  moving-average  orders.  Each  table  cell  shows  the 
two  applicable  counts,  with  the  TOP  7  and  R  counts  in  the  upper  and  lower 
parts  of  each  cell,  respectively.  For  example,  with  200  observations  per 
replication,  there  were  61  and  79  ARMA(1,0)  models  fitted  for  respective 
TOft  7  and  8.  We  conclude  that  low-order  ARMA  models  produce  statistically 
acceptable  fits  for  queuing  system  data,  at  least  for  the  case  of  the  two 
queuing  TOPi  used  for  testing  purposes  in  this  work. 
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The  results  In  Table  XI  can  also  be  compared  with  the  statement  of  Steudel 
and  Wu  [1977]  that  output  from  an  M/M/1  queuing  system  can  be  fitted  with  an 
ARMA(1,0)  model  and  that.  In  general,  ARMA  models  of  order  (p,p  -  1)  provide 
acceptable  fits  for  outputs  from  queuing-system  simulations*  In  the  TOP  7 
M/M/1  system,  251  of  the  400  total  replications  were  fitted  by  ARMA(1,0) 
models.  And  In  the  TOP  8  M/D/3  queuing  system,  302  of  the  400  total  replica¬ 
tions  were  fitted  by  either  ARMA(1,0)  or  ARMA(2,1)  models. 

6.  CONCLUSIONS 

Two  ARMA-based  confidence-interval  procedures  have  been  described,  and 
results  of  subjecting  these  procedures  to  extensive  testing  have  been  reported. 
The  differences  In  the  performance  characteristics  of  the  alternative  pro¬ 
cedures  are  small.  Both  procedures  work  well  when  used  to  process  ARMA- 
generated  data  for  the  ranges  and  combinations  of  autoregressive  and  moving- 
average  orders  and  parameter  values  Investigated.  Although  Method  1  produces 
somewhat  more  stable  confidence  Intervals  than  Method  2  as  measured  by  MOE  3, 
the  differences  In  the  performance  characteristics  of  the  two  alternative 
procedures  for  tailor-made  data  are  small  on  balance. 

Both  confidence-interval  procedures  perform  In  less-than-satlsfactory 
fashion  when  used  to  process  data  produced  by  two  queuing-system  simulations 
chosen  for  testing  purposes.  The  underlying  cause  for  this  poor  performance 
may  be  the  demonstrated  correlation  between  the  sample  means  and  their 
standard  errors  In  the  queuing  output,  and/or  the  demonstrated  nonnormality 
In  the  distribution  of  the  residuals  associated  with  the  ARMA  models  fitted 
to  the  queuing-system  output.  We  advise  that  practitioners  conduct  appro¬ 
priate  correlation  and  normality  tests  on  simulation  output  prior  to  using 
these  ARMA-based  confidence-interval  procedures. 
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